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Abstract 

With the advent of massive data outputs at a regular rate, ad¬ 
mittedly, signal processing technology plays an increasingly key role. 
Nowadays, signals are not merely restricted to physical sources, they 
have been extended to digital sources as well. 

Under the general assumption of discrete statistical signal sources, 
we propose a practical problem of sampling incomplete noisy signals 
for which we do not know a priori and the sample size is bounded. 
We approach this sampling problem by Shannon’s channel coding the¬ 
orem. We use an extremal binary channel with high probability of 
transmission error, which is rare in communication theory. Our main 
result demonstrates that it is the large Walsh coefficient (s) that char- 
acterize(s) discrete statistical signals, regardless of the signal sources. 
Note that this is a known fact in specific application domains such as 
images. By the connection of Shannon’s theorem, we establish the nec¬ 
essary and sufficient condition for our generic sampling problem for the 
first time. Finally, we discuss the cryptographic significance of sparse 
Walsh transform. 

Keywords. Walsh transform, Shannon’s channel coding theorem, 
channel capacity, extremal binary channel, generic sampling. 


1 Introduction 

With the advent of massive data outputs regularly, we are confronted by the 
challenge of big data processing and analysis. Admittedly, signal processing 
has become an increasingly key technology. An open question is the sampling 
problem with the signals, for which we assume that we do not know a priori. 
Due to reasons of practical consideration, sampling is affected by possibly 
strong noise and/or the limited measurement precision. Assuming that the 
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signal source is not restricted to a particular application domain, we are 
concerned with a practical and generic problem to sample these nosiy signals. 

Our motivation arises from the following problem in modern applied 
statistics. Assume the discrete statistical signals in a general setting as 
follows. The samples, generated by an arbitrary (possibly noise-corrupted) 
source F. are 2 n -valued for a fixed n. We assume that the noise source gener¬ 
ates uniformly-distributed sample^]. Note that our assumption on a general 
setting of discrete statistical signals is described by the assumption that F is 
an arbitrary yet fixed (not necessarily deterministic) function. It is known to 
be a hypothesis testing problem to test presence of any real signals. Tradi¬ 
tionally, F is a deterministic function with small or medium input size. It is 
computationally easy to collect the complete and precise distribution / of F. 
Based on relative entropy (or Kullback-Leibler distance), the conventional 
approach (aka. the classic distinguisher in statistical cryptanalysis [181 119] ) 
solves the sampling problem, given the distribution / a priori. Nevertheless, 
in reality, F might be a function that we do not have the complete descrip¬ 
tion, or it might be a non-deterministic function, or it might just have large 
input size. Thus, it is infeasible to collect the complete and precise distribu¬ 
tion /. This gives rise to the new generic statistical sampling problem with 
discrete incomplete noisy signals, using bounded samples. 

In this work, we show that we can solve the generic sampling problem 
as reliable as possible without knowing a priori. We approach this problem 
by the novel use of Shannon’s channel coding theorem, which establishes 
the achievability of channel capacity. This allows to obtain a simple robust 
solution with an arbitrarily small probability of error. Note that in the 
conventional approach (i.e., the classic distinguisher), the problem statement 
is slightly different and the solution is of a different form. Our work uses the 
binary channel. The channel is assumed to have extremely high probability 
of transmission error (and we call it the extremal binary channel), which is 
rare in communication theory Ed- In particular, for the Binary Symmetric 
Channel (BSC) with crossover probability (1 — d)/2 and d is small (i.e., \d\ -C 
1), the channel capacity is approximately d 1 2 /(2 log 2). Further, we construct 
a non-symmetric binary channel with crossover probability (1 — d)/2 and 
1/2 respectively (and d is small). We show that the channel capacity is 
approximately d 2 /(81og2). 

Our main contributions are as follows. First, we present the generic 
sampling theorem. We show that for this extremal non-symmetric binary 

1 For the pure digital signal source F, which is our research subject throughout this 

work, this assumption is justified by the maximum entropy principle [6} P278]. 
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channel, Shannon’s channel coding theorem can solve the generic sampling 
problem under the general assumption of statistical signal sources (i.e., no 
further assumption is made about signal sources). Specifically, the necessary 
and sufficient condition is given for the first time to sample the incomplete 
noisy signals with bounded sample size for signal detection. It is interesting 
to observe that the classical signal processing tool of Walsh transform m is 
essential: regardless of the real signal sources, the large Walsh coefficient(s) 
characterize(s) discrete statistical signals. Put other way, when sampling 
incomplete noisy signals of the same source multiple times, one can expect 
to see repeatedly those large Walsh coefficient(s) of same magnitude(s) at 
the fixed frequency position(s). Note that this is known in specific appli¬ 
cation domains such as images, voices etc. Clearly, our result shows strong 
connection between Shannon’s theorem and Walsh transform. Both are the 
key innovative technologies in digital signal processing. 

Secondly, our generic sampling theorem is naturally linked to the new 
area of compressive sensing |7]. Compressive sensing is based on the ground 
of sparse representation of signals in the transform domain. This enables 
powerful sampling techniques (with respect to the complexity of time-domain 
components for access and the time cost) for the purpose of signal recovery. 
Specifically, sparse Fourier transform has been the main research subject in 
this area. Most recently, studies on sparse Walsh transform follow mm- 
Our preliminary work finds that in the most general case, sparse Walsh 
transform is linked (see mm) to the maximum likelihood decoding prob¬ 
lem for linear codes, which is known to be NP-complete. 

The rest of the paper is organized as follows. In Section [2j we give pre¬ 
liminaries on Walsh transforms. In Section [2 we review Shannon’s channel 
coding theorem. In Section [4j we translate Shannon’s theorem in the case 
of extremal binary channels to hypothesis testing problems. Based on the 
results, we present our main sampling theorem in Section [5j we also discuss 
the cryptographic significance. We give concluding remarks in Section [6l 

2 Walsh Transforms in Statistics 

Given a real-valued function / : GF{2) n —>• R, which is defined on an n-tuple 
binary vector of input, the Walsh transform of /, denoted by /, is another 
real-valued function defined as 

m= £ (-i) <w> /o), (i) 

j6GF(2)™ 
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for all i € GF( 2) n , where < i,j > denotes the inner product between two 
n-tuple binary vectors i,j. For later convenience, we give an alternative 
definition below. Given an input array x = (xq, xi, ..., X 2 *-i) of 2 n reals 
in the time domain, the Walsh transform y = x = (yo, yi, ..., y 2 n -i ) of x is 
defined by 

lh = Y ( -l ) <4J>x ?’ 

j&GF{ 2)" 

for any n-tuple binary vector i. We call Xi (resp. yi) the time-domain 
component (resp. transform-domain coefficient) of the signal with size 2 n . 
For basic properties and references on Walsh transforms, we refer to mm- 
Let / be a probability distribution of an n-bit random variable X = 
(X n ,X n -i,... ,X\), where each X,; € {0,1}. Then, f(m) is the bias of the 
Boolean variable < m, X > for any fixed n-bit vector m, which is often called 
the output pattern or mask. Here, recall that a Boolean random variable A 
has bias e, which is defined by e = E[{— l)- 4 ] = Pr(^l = 0) — Pr(^l = 1). 
Hence, if A is uniformly distributed, A has bias 0. Obviously, the pattern 
m should be nonzero. 

Walsh transforms were used in statistics to find dependencies within a 
multi-variable data set. In the multi-variable tests, each Xi indicates the 
presence or absence (represented by ‘1’ or ‘0’) of a particular feature in a 
pattern recognition experiment. Fast Walsh Transform (FWT) is used to 
obtain all coefficients f(m ) in one shot. By checking the Walsh coefficients 
one by one and identifying the larg^l ones, we are able to tell the dependen¬ 
cies among Xj’s. 

3 Review on Shannon’s Channel Coding Theorem 

We briefly review Shannon’s famous channel coding theorem (cf. [6]). First, 
we recall basic definitions of Shannon entropy. The entropy H(X) of a 
discrete random variable X with alphabet X and probability mass function 
p(x) is defined by 

h ( x ) = ~Y p ^ log ^ p ^- 

x&X 

2 We use the convention in signal processing to refer to the large transform-domain 
coefficient d as the one with a large absolute value throughout the paper. 
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The joint entropy H (Xi,..., X n ) of a collection of discrete random variables 
(Xi,... , X n ) with a joint distribution p(xi,X 2 , ■ ■ ■, x n ) is defined by 

H(X i,...,X n ) = - ^ p{x 1 ,x 2 ,...,x n )\og 2 p(x 1 ,x 2 ,...,x n ). 

Define the conditional entropy H(Y\X) of a random variable Y given an¬ 
other X as 

H(Y\X) = ^ p{x)H(Y\X = x). 

X 

The mutual information /(X; Y) between two random variables X, Y is equal 
to H(Y)—H(Y\X), which always equals H(X)—H(X\Y). A communication 
channel is a system in which the output Y depends probabilistically on 
its input X. It is characterized by a probability transition matrix that 
determines the conditional distribution of the output given the input. 

Theorem 1 (Shannon’s Channel Coding Theorem). Given a channel, de¬ 
note the input, output by X, Y respectively. We can send information at the 
maximum rate C bits per transmission with an arbitrarily low probability of 
error, where C is the channel capacity defined by 

C = max I (X; X), (2) 

p(x) 

and the maximum is taken over all possible input distributions p{x). 

For the binary symmetric channel (BSC) with crossover probability!! p, 
C can be expressed by (cf. [6]): 

C = 1 — H(p) bits/transmission. (3) 

Herein, we refer to the BSC with crossover probability p = (1 + d)/2 and 
d is small (i.e., \d\ <C 1) as an extremal BSC. We can prove for the channel 
capacity for an extremal BSC (see Appendix for proof): 

Corollary 1 (extremal BSC). Given a BSC channel with crossover proba¬ 
bility p = (l + d)/2, if d is small (i.e., \d\ C 1), then, C^co-d 2 , where the 
constant Co = 1/(2 log 2). 

Therefore, for an extremal BSC, we can send one bit with an arbitrarily 
low probability of error with the minimum number of transmissions 1/C = 
(2 log 2 )/d 2 , i.e., 0(l/d 2 ). In next section, we will translate Corollary Q] 
to two useful statistical results. Interestingly, note that in communication 
theory, this extremal BSC is rare because of its low efficiency m and we 
typically deal with |d| S> 0. 

3 that is, the input symbols are complemented with probability p 
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4 Statistical Translations of Shannon’s Theorem 


Let Xq,Xi denote the Boolean random variable with bias +d, —d respec¬ 
tively (and we restrict ourselves to |d| -C 1). Denote the probability distri¬ 
bution of Xo, X\ by Do, D\ respectively. Let D € {Do, D{\. We are given a 
binary sequence of random bits with length N, and each bit is independent 
and identically distributed (i.i.d.) following the distribution D. As a con¬ 
sequence of Shannon’s channel coding theorem, we now solve a hypothesis 
testing problem in statistics: answer the minimum N required to decide 
whether D = Dq or D = D\ with an arbitrarily low probability of error. 

We translate this problem into a BSC channel coding problem as fol¬ 
lows. The inputs are transmitted through a BSC with error probability 
p = (1 — d)/2. By Shannon’s channel coding theorem, with a minimum 
number of N = 1/C transmissions, we can reliably (i.e., with an arbitrarily 
low probability of error) determine whether the input is ‘0’ or ‘1’. The for¬ 
mer case implies that the received sequence corresponds to the distribution 
Do (i.e., a bit ‘1’ occurs in the output sequence with probability p), while 
the latter case implies that the received sequence corresponds to the distri¬ 
bution D\ (i.e., a bit ‘0’ occurs in the output sequence with probability p). 
This solves the problem stated above. Using Corollary [T] with p = (1 — d)/2 
(for \d\ <C 1), we have N = (21og2 )/d 2 , i.e., 0(l/d 2 ). Thus, we have just 
shown that Shannon’s Channel Coding Theorem can be translated to solve 
the following hypothesis testing problem: 

Theorem 2. Assume that the boolean random variable A, B has bias +d, 
—d respectively and d is small. We are given a sequence of random samples, 
which are i.i.d. following the distribution of either A or B. We can tell the 
sample source with an arbitrarily low probability of error, using the minimum 
number N of samples (21og2 )/d 2 , i.e., 0(l/d 2 ). 

Further, the following variant is more frequently encountered in hypothe¬ 
sis testing, in which we have to deal with a biased distribution and a uniform 
distribution altogether. 

Theorem 3. Assume that the boolean random variable A has bias d and 
d is small. We are given a sequence of random samples, which are i.i.d. 
following the distribution of either A or a uniform distribution. We can 
tell the sample source with an arbitrarily low probability of error, using the 
minimum number N of samples (81og2 )/d 2 , i.e., 0(l/d 2 ). 

Proof. It is clear that the construction of using a BSC in the proof of The¬ 
orem [2] does not work here, as the biases (i.e., d, 0 respectively) of the two 
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sources are non-synrmetric. Thus, we propose to use Shannon’s channel 
coding theorem with a non-synrmetric binary channel rather than a BSC. 
Assume the channel with the following transition matrix 



where p e = (1 — d)/2 and d is small. The matrix entry in the xth row 
and the yth column denotes the conditional probability that y is received 
when x is sent. So, the input bit 0 is transmitted by this channel with error 
probability p e (i.e., the received sequence has bias d if input symbols are 
0) and the input bit 1 is transmitted with error probability 1/2 (i.e., the 
received sequence has bias 0 if input symbols are 1). 

To compute the channel capacity C (i.e., to find the maximum) defined 
in ([2]) . no closed form solution exist in general. Nonlinear optimization 
algorithms m are known to find a numerical solution. Below, we propose 
a simple method to give a closed form estimate C for our extremal binary 
channel. As I(X;Y ) = H(Y) — H(Y\X), we first compute H(Y) by 



where po denote p(x = 0) for short. Next, we compute H(Y\X) as follows, 


H(Y\X) = ^2p(x)H(Y\X = x) 


X 



Combining Q and ()5|) . we have 



As p e = (1 — d)/ 2, we have 



We apply (fT4l) (in Appendix) 



( 6 ) 
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for small d. Note that the last term O^p^d 4 ) on the right side of ([6]) is 
ignorable. Thus, I(X;Y ) approaches the maximum when 

^(¥)-l ^ ^7(2log 2) _ 1 
d 2 /(log2) d 2 /(log2) 2' 

Consequently, we estimate the channel capacity from (|6|) by 


C» i -id 2 /(21og2) + i(l-ff(i T 2)) 

which is d 2 / (8 log 2). 


—d 2 / (8 log 2) + d 2 / (4 log 2), 


□ 


Remark 1. In statistical cryptanalysis (cf. fTMWj/ ), Theorem^ and Theo¬ 
rem 0 were known in slightly different contexts: the probability of error is a 
parameter and the sample number is known on the order of 1 / d 2 . By asking 
for an arbitrarily low probability of error, we are able to give an alternative 
proof using channel capacity rather than relative entropy (or Kullback-Leibler 
distance). While the latter is used as the classical tool to solve hypothesis 
testing problems, here we show that hypothesis testing problems can be linked 
to channel capacity. 


5 Sampling Theorems with Incomplete Signals 

In this section, we apply the hypothesis testing result (Theorem [3]) to two 
sampling problems (the classical and generic versions). Without loss of 
generality, we assume the discrete statistical signals are not restricted to 
a particular application domain. Assume that (possibly noise-corrupted) 
signals are 2 n -valued and noises are uniformly distributed. For the signal 
detection problem (i.e., to test presence of real signal), we adopt the con¬ 
ventional approach of statistical hypothesis testing. Rather than using the 
direct signal detection method (as done in specific application domains), 
we propose to perform the test between the associated distribution and the 
uniform distribution. 

We give the mathematical model on the signal F as follows. F is an 
arbitrary (and not necessarily deterministic) function. Let X be the n- 
bit output sample of F, assuming that the input is random and uniformly 
distributed. Denote the output distribution of X by /. Note that our 
assumption on a general setting of discrete statistical signals is described by 
the assumption that F is an arbitrary yet fixed function. 
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Firstly, the classical sampling problem (which can be interpreted as the 
classical distinguished) is formally stated as follows. 

Theorem 4 (Classical Sampling Problem). Assume that the largest Walsh 
coefficient of f is d = /(mo) for a nonzero n-bit vector mo- We can detect 
F with an arbitrarily low probability of error, using minimum number N = 
(81og2 )/d 2 of samples of F, i.e., 0(l/d 2 ). 

The proof can be easily obtained by applying Theorem [3] and we omit it 
here. The classical sampling problem assumes that F together with the its 
characteristics (i.e., the largest Walsh coefficient d) are known a priori. It 
aims at detecting signal with an arbitrarily low probability of error, using 
minimum samples. 

Next, we will present our main sampling theorem, a more practical (and 
widely applicable) sampling theorem formally. Assuming that it is infeasible 
to know signal F a priori, we want to detect signals with an arbitrarily low 
probability of error and with bounded sample size. Note that the sampled 
signal is incomplete (possibly noisy) and the associated distribution is noisy 
(i.e., not precise). And we call this problem as generic sampling with incom¬ 
plete noisy signals. In contrast to the classical distinguisher, this result can 
be interpreted as a generalized distinguished hi the context of statistical 
cryptanalysis. We give our first result with n = 1 below. 

Theorem 5 (Generic Sampling Problem with n = 1). Assume that the 
sample size of F is upper-bounded by N. Regardless of the input size of F, 
in order to detect F with an arbitrarily low probability of error, it is necessary 
and sufficient to have the following condition satisfied, i.e., f has a nontrivial 
Walsh coefficient d with |d| > c/y/N, where the constant c = \/81og 2. 

Proof. Note that the only nontrivial Walsh coefficient d for n = 1 is /(1), 
which is nothing but the bias of F. First, we will show by contradiction 
that this is a necessary condition. That is, if we can identify F with an 
arbitrarily low probability of error, then, we must have |d| > c/y/N. Suppose 
|d| < c/y/N otherwise. Following the proof of Theorem [3J we know that the 
error probability is bounded away from zero as the consequence of Shannon’s 
Channel Coding Theorem. This is contradictory. Thus, we have shown that 
the condition on d is a necessary condition. Next, we will show that it is also 

4 As mentioned in Remark [l] the problem statement of the classical distinguisher is 
slightly different; it often deals with a large d (using a slightly different N) rather than 
the largest d (cf. }19j'l. 

5 With n = 1, this appears as an informal result in cryptanalysis, which is used as a 
black-box analysis tool in several crypto-systems. 
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a sufficient condition. That is, if |d| > c/y/N, then, we can identify F with 
an arbitrarily low probability of error. This follows directly from Theorem 
[4] with n = 1. We complete our proof. □ 

Now, we make a generalized proposition for n > 1, which incorporates 
Theorem [5] as a special case: 

Proposition 1 (Generic Sampling Problem with n > 1). Assume that the 
sample size of F is upper-bounded, by N. Regardless of the input size of F, in 
order to detect F with an arbitrarily low probability of error, it is necessary 
and sufficient to have the following condition satisfied, i.e., X^o(/(*)) 2 — 
(8n log 2 )/N. 

We note that the sufficient condition can be proved based on results 
of classic distinguisher (i.e., Squared Euclidean Imbalance) which uses the 
notion of relative distance and states that (/(*)) 2 > (4n log 2)/IV is 
required for high probability [El- 

According to Theorem [5] and Proposition [TJ note that a real signal F 
should have the following property in the form of (.2 norm of the associated 
distribution given the sample size N : 

il/lli — 1 + 8nlog 2/N , 
where the ^2 norm of / is defined as 


ii/ib = / E 

y i&GF{2 )« 

By duality of time-domain and transform-domain signals, we make an¬ 
other proposition following Proposition [1] 

Proposition 2. The discrete statistical signals can be characterized by large 
Walsh coefficients of the associated distribution. 

Proposition [2] implies that the most significant transform-domain signals 
are the largest coefficients in our generalized model. This is a known fact in 
application domains such as images, voices etc. Nonetheless, for those sig¬ 
nals, Walsh transform is directly applied to the time-domain samples rather 
than the associated distribution of the collected samples in our model; in 
analogy to Proposition [2 it is known that those signals can be characterized 
by large Walsh coefficients as well. 
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5.1 Cryptographic Significance on Sparse Walsh Transforms 

In symmetric cryptanalysis, Walsh transforms play an essential role (cf. 
mmi including bias computing. 

Following the recent successful development of compressive sensing [7], 
it is shown that surprisingly, sparse Fourier transform significantly outper¬ 
forms FFT (Fast Fourier Transforms). For the problem size N, ^-sparse 
Fourier transforms ( k <C N) aims at faster computing k non-zero or large 
coefficients and (N — k ) zero or negligible small ones, in comparison to FFT. 
For instance, according to m Fig. 1], with N = 2 28 ,k = 50, theoretical 
estimate on the time complexity of FFT is N ■ log 2 N ~ 7 x 10 9 units; for 
sparse Fourier transforms, the estimated theoretical complexity is 10 7 units, 
i.e., a great reduction factor of 700 is obtained. 

Due to the similarity of Fourier transform and Walsh transform, most 
recently, research on sparse Walsh transform follows mm- As illustration, 
assume k non-zero coefficients and {N — k ) zero coefficients in a simplified 
model. With the same parameters (N = 2 28 ,k = 50) as above, for sparse 
Walsh transform, the conservative theoretical time complexityj is around 
38000 units. This time unit is not comparable to the one in the case of FWT, 
i.e., 7 x 10 9 units. Nonetheless, we estimate a rough reduction factor of 8000 
by full . Fig. 8]. Additionally, for k = 2,4,12, 25, sparse Walsh transform |16| 
has the estimated time of 1600, 3000, 8200,16400 units respectively. 

According to our discussions in this section, it is natural to link the first 
key challenge to the generic approach of sparse Walsh transforms. In jl2lll3j . 
finding the largest Walsh coefficient is linked to maximum likelihood decod¬ 
ing problem for linear codes, which is known to be NP-complete. Assume 
k large coefficients and ( N — k ) zero or negligible small ones in a general 
setting. It seems other than FWT, no efficient algorithms exist to com¬ 
pute sparse Walsh transforms. In contrast, in the simplified ^-sparse model, 
theoretical estimate for the time complexity corresponding to k = 1,2 is 
(log 2 IV) 2 ,2(log 2 N) 2 . That is, we have the complexity on the order of 
(log 2 N ) 2 (resp. N log 2 N) in the simplified model (resp. the general model). 
And we are working on approximate signal recovery in presence of noise to 
gain more insights about the first challenge. 

6 The required time-domain components for access is around 6700 (see [16;, Theorem 
1]) rather than N for FWT. 
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6 Concluding Remarks 


We model general discrete statistical signals as the output samples of an un¬ 
known arbitrary yet fixed function (which is the signal source). We translate 
Shannon’s channel coding theorem in the extremal case of a binary channel 
to solve a hypothesis testing problem. Due to high probability of transmis¬ 
sion error, this extremal binary channel is rare in communication theory. 
Nonetheless, the translated result allows to solve a generic sampling prob¬ 
lem, for which we know nothing about the signal source a priori and we can 
only afford bounded sampling measurements. Our main results demonstrate 
that the classical signal processing tool of Walsh transform is essential: it is 
the large Walsh coefficient(s) that characterize(s) discrete statistical signals, 
regardless of the signal sources. By Shannon’s theorem, we establish the nec¬ 
essary and sufficient condition for the generic sampling problem under the 
general assumption of statistical signal sources. It shows strong connection 
between Shannon’s theorem and Walsh transform; both are the key innova¬ 
tive technologies in digital signal processing. Our results can also be seen 
as generalization of the classic distinguisher; the latter is based on relative 
distance and is the standard tool for statistical hypothesis testing problems. 
Finally, based on our preliminary work on sparse Walsh transforms in the 
context of compressive sensing, we discuss the cryptographic significance. 
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Appendix: Proof of Corollary [T] 

Let p = (1 + d)/2 and so \d\ < 1. For \d\ < 1, we will first show 
1 + d\ _ / d z cP d° <F \ 1 


H 


= 1 - 


(f_ cP d® cP 

T + l2 + 30 + 56 + 


0(d 10 ) 
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by definition of entropy. Using Taylor expansion series for 0 < d < 1, we 
have 
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2 3 4 

, , d 3 d 5 d 7 
= 2(d +y + y + y + - 


( 12 ) 

(13) 


Putting (PD and (fl3l) into (fill) , we have 


- H 


1 + d 


log 2 


d 4 d 6 d 8 




+ 


+ 


/ 2 i‘ d s d s 

\ , \ 

( d+ T + T + T + - 

) - log 2 J 


1 / d‘ 2 d 4 d e d 8 

■( — + — + — + — + 


log 2 V 2 12 30 56 


- 1 , 
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which leads to ([7]) for 0 < d < 1. For — 1 < d < 0, we use symmetry of 
entropy H(^ L ) = H(^-*r) and apply above result to justify the validity of 
(0) for \d\ < 1. 

Note that if |d| •C 1, (0) reduces to 

H {^)=l-d 2 /(2log2) + 0(d 4 ). (14) 

So, we can calculate C in Q by 

c - 1 - H ( lJ r) = ( d2 + o(rf 4 ))/(2io g 2)« 

which completes our proof. 
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